1 Overview

We are strong supporters of open-source tools for reproducible research. Priority index or Pi is developed as a genomic-led target prioritisation system, with the focus on leveraging genetic data to prioritise targets at the gene and pathway level. Based on evidence of genetic association from GWAS data, this prioritisation system is able to resolve modulated genes (seed genes) by utilising knowledge of linkage disequilibrium (co-inherited variants), distance from the gene, and evidence of genetic association with gene expression. Seed genes are scored in an integrative way quantifying the genetic influence. Scored seed genes are subsequently used as baits to rank seed genes plus additional (non-seed) genes; this is achieved by iteratively exploring the global connectivity of a gene interaction network. Genes with the top priority are further used to identify/prioritise pathways that are significantly enriched with highly prioritised genes. Prioritised genes are also used to identify a gene network interconnecting many highly prioritised genes but a few lowly prioritised genes as linkers.

In the Showcases section, we illustrate the power of Pi to perform disease-centric genetic target prioritisation at the gene, pathway and network level using several inflammatory diseases as exemplars, including Ankylosing Spondylitis, Spondyloarthritis, and Systemic Lupus Erythematosus.

In the Ontology annotations & TPN nominations section, we integrate ontology annotations and TPN nominations as an additional support for disease-specific genetic prioritisation by Pi.

In the Cross-disease comparisons section, we summarise and compare results of genetic target prioritisation across diseases, done at the gene, pathway and network level.

2 Web app

A web-based interface of Pi is available here, which allows on-the-fly prioritisation for a user-defined SNP list.

3 Workflow

4 R functions

This intends for end-users who are comfortable with R (with the latest version available at http://cran.r-project.org). Priority functions are all contained in a new package called Pi which can be installed following instructions below:

# install the dependant packages including `XGR` and other packages used here
source("http://bioconductor.org/biocLite.R")
biocLite(c("XGR","devtools","VennDiagram"), siteRepos=c("http://cran.r-project.org"))
library(devtools)
install_github(c("hfang-bristol/XGR", "hfang-bristol/dnet"), dependencies=T)

# also, install the `Pi` package itself
library(devtools)
install_github(c("hfang-bristol/Pi"), dependencies=T)

Priority functions are designed in a nested way. The core relation follows this route: xPrioritiserSNPs -> xPrioritiserGenes -> xPrioritiser -> xRWR, achieving gene-level prioritisation from an input list of SNPs (along with their significant level). The output of this route is taken as the input of either xPrioritiserManhattan for visualising gene priority scores, xPrioritiserPathways for prioritising pathways, or xPrioritiserSubnet for identifying a network of top prioritised genes.

4.1 xRWR

xRWR: implements Random Walk with Restart (RWR) estimating the affinity score of nodes in a graph to a list of seed nodes. The affinity score can be viewed as the influential impact over the graph that is collectively imposed by seed nodes. If the seeds are not given, it will pre-compute affinity matrix for nodes in the input graph with respect to each starting node (as a seed) by looping over every node in the graph. A higher-level function xPrioritiser directly relies on it.

4.2 xPrioritiser

xPrioritiser: uses RWR to calculate the affinity score of nodes in a graph to a list of seed nodes. A node that has a higher affinity score to seed nodes will receive a higher priority score. It is an internal function acting as a general template for RWR-based prioritisation. A higher-level function xPrioritiserGenes directly relies on it.

4.3 xPrioritiserGenes

xPrioritiserGenes: prioritises gene targets from an input gene interaction network and the score info imposed on its seed nodes/genes. This function can be considered to be a specific instance of xPrioritiser, that is, specifying a gene interaction network as a graph and seed nodes as seed genes.

There are two sources of gene interaction network information: the STRING database (Szklarczyk et al. 2015) and the Pathways Commons database (Cerami et al. 2011). STRING is a meta-integration of undirect interactions from a functional aspect, while Pathways Commons mainly contains both undirect and direct interactions from a physical/pathway aspect. In addition to interaction type, users can choose the interactions of varying quality:

Code Interaction (type and quality) Database
STRING_high Functional interactions (with high confidence scores>=700) STRING
STRING_medium Functional interactions (with medium confidence scores>=400) STRING
PCommonsUN_high Physical/undirect interactions (with references & >=2 sources) Pathways Commons
PCommonsUN_medium Physical/undirect interactions (with references & >=1 sources) Pathways Commons
PCommonsDN_high Pathway/direct interactions (with references & >=2 sources) Pathways Commons
PCommonsDN_medium Pathway/direct interactions (with references & >=1 sources) Pathways Commons

4.4 xPrioritiserSNPs

xPrioritiserSNPs: prioritises gene targets from an input gene interaction network and a given list of SNPs together with the significance level (eg GWAS reported p-values). To do so, it first defines seed genes and their scores that are calculated in an integrative manner to quantify the genetic influence under SNPs. Genetic influential score on a seed gene is calculated from the SNP score (reflecting the SNP significant level), the gene-to-SNP distance weight and the eQTL mapping weight. This function can be considered to be a specific instance of xPrioritiserGenes, that is, specifying seed genes plus their scores.

Knowledge of co-inherited variants is also used to include additional SNPs that are in Linkage Disequilibrium (LD) with GWAS lead SNPs. LD SNPs are calculated based on 1000 Genomes Project data (1000 Genomes Project Consortium 2012). LD SNPs are defined to be any SNPs having R2 > 0.8 with GWAS lead SNPs. The user can choose the population used to calculate LD SNPs:

Code Population Project
AFR African 1000 Genomes Project
AMR Admixed American 1000 Genomes Project
EAS East Asian 1000 Genomes Project
EUR European 1000 Genomes Project
SAS South Asian 1000 Genomes Project

4.5 xPrioritiserManhattan

xPrioritiserManhattan: visualises prioritised genes using manhattan plot, in which priority for genes is displayed on the Y-axis along with genomic locations on the X-axis. Also highlighted are genes with the top priority.

4.6 xPrioritiserPathways

xPrioritiserPathways: prioritises pathways based on enrichment analysis of genes with the top priority (eg top 100 genes) using a compendium of pathways. A highly prioritised pathway has its member genes with high gene-level priority scores, that is, having evidence of direct modulation by disease-associated lead (or LD) SNPs, and/or having evidence of indirect modulation at the network level.

In addition to pathways, enrichment analysis can be extended to other ontologies, as listed below:

Code Ontology Category
DO Disease Ontology Disease
GOMF Gene Ontology Molecular Function Function
GOBP Gene Ontology Biological Process Function
GOCC Gene Ontology Cellular Component Function
HPPA Human Phenotype Phenotypic Abnormality Phenotype
HPMI Human Phenotype Mode of Inheritance Phenotype
HPCM Human Phenotype Clinical Modifier Phenotype
HPMA Human Phenotype Mortality Aging Phenotype
MP Mammalian/Mouse Phenotype Phenotype
DGIdb DGI druggable gene categories Druggable
SF SCOP domain superfamilies Domain
PS2 phylostratific age information (our ancestors) Evolution
MsigdbH Hallmark gene sets Hallmark (MsigDB)
MsigdbC1 Chromosome and cytogenetic band positional gene sets Cytogenetics (MsigDB)
MsigdbC2CGP Chemical and genetic perturbation gene sets Perturbation (MsigDB)
MsigdbC2CPall All pathway gene sets Pathways (MsigDB)
MsigdbC2CP Canonical pathway gene sets Pathways (MsigDB)
MsigdbC2KEGG KEGG pathway gene sets Pathways (MsigDB)
MsigdbC2REACTOME Reactome pathway gene sets Pathways (MsigDB)
MsigdbC2BIOCARTA BioCarta pathway gene sets Pathways (MsigDB)
MsigdbC3TFT Transcription factor target gene sets TF targets (MsigDB)
MsigdbC3MIR microRNA target gene sets microRNA targets (MsigDB)
MsigdbC4CGN Cancer gene neighborhood gene sets Cancer (MsigDB)
MsigdbC4CM Cancer module gene sets Cancer (MsigDB)
MsigdbC5BP GO biological process gene sets Function (MsigDB)
MsigdbC5MF GO molecular function gene sets Function (MsigDB)
MsigdbC5CC GO cellular component gene sets Function (MsigDB)
MsigdbC6 Oncogenic signature gene sets Oncology (MsigDB)
MsigdbC7 Immunologic signature gene sets Immunology (MsigDB)

4.7 xPrioritiserSubnet

xPrioritiserSubnet: identifies a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. An iterative procedure of choosing different priority cutoff is also used to identify the network with a desired number of nodes/genes.

5 Showcases

In this section, we give a step-by-step demo of using Pi to carry out disease-specific genetic prioritisation of targets at the gene and pathway level, focusing on three inflammatory diseases as exemplars: Ankylosing Spondylitis (AS), Spondyloarthritis (SA, including AS and Psoriatic Arthritis), and Systemic Lupus Erythematosus (SLE).

First of all, load the packages including Pi:

library(Pi)
# Following packages are also needed
library(XGR)
library(RCircos)
library(VennDiagram)

# optionally, specify the local location of built-in RData
# by default:
RData.location <- "https://github.com/hfang-bristol/RDataCentre/blob/master/Portal"

5.1 Ankylosing Spondylitis

GWAS SNPs are collected mainly from GWAS Catalog (Welter et al. 2014), complemented by ImmunoBase and latest publications.

Import AS-associated GWAS lead SNPs (with the help of Anna Sanniti)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/AS.txt"
AS <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

The first 5 rows of the data frame AS are shown below, with the column SNP for AS GWAS SNPs and the column Pval for GWAS-detected P-values.

SNP Pval
rs10045403 5.8e-14
rs1018326 2.0e-06
rs10440635 3.0e-07
rs10781500 1.0e-06
rs10865331 1.9e-19

5.1.1 Gene-level prioritisation

It includes the following steps:

  1. Define seed genes based on distance-to-SNP window and genetic association with gene expression: nearby genes that are located within 200kb distance window of lead or Linkage Disequilibrium (LD) SNPs that are calculated based on European population data from 1000 Genome Project; expression associated genes (eQTL genes) for which gene expression is, either in a cis- or trans-acting manner, significantly associated with lead or LD SNPs, based on immune-stimulated eQTL data in monocytes (Fairfax et al. 2014).

  2. Score seed genes to quantify the genetic influence under lead or LD SNPs.

  3. Prioritise target genes by estimating their global network connectivity to seed genes. With scored seed genes superposed onto a gene interaction network (see above STRING_high), RWR algorithm is implemented to prioritise candidate target genes based on their network connectivity/affinity to seed genes. As such, a gene that has a higher affinity score to seed genes will receive a higher priority score.

pAS <- xPrioritiserSNPs(data=AS, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pAS$priority, which can be saved into a file AS.genes_priority.txt:

AS_genes <- pAS$priority
write.table(AS_genes, file="AS.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
CAST 1 52.13154 0.08460 1
ERAP2 1 22.33534 0.03626 2
ERAP1 1 20.02777 0.03276 3
B3GNT2 1 12.76706 0.02077 4
IL23R 1 11.19898 0.01823 5
LNPEP 1 10.36132 0.01696 6
DDX39B 1 7.91010 0.01284 7
IL12RB2 1 7.32161 0.01197 8
ETS2 1 7.14037 0.01160 9
PSMG1 1 6.69782 0.01091 10
TBKBP1 1 6.58648 0.01071 11
NFKBIL1 1 6.00768 0.01039 12
KIF21B 1 5.82870 0.01008 13
GPR65 1 5.97141 0.00969 14
C1orf106 1 5.21012 0.00854 15
IL6R 1 4.57166 0.00747 16
TNFRSF1A 1 4.24279 0.00717 17
NPEPPS 1 4.12215 0.00670 18
KPNB1 1 4.09678 0.00666 19
GALC 1 4.01292 0.00654 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_AS <- xPrioritiserManhattan(pAS, highlight.top=20, cex=0.5, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_AS)

5.1.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `AS.pathways_priority.txt`
AS_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(AS_pathways), AS_pathways)
write.table(output, file="AS.pathways_priority.txt", sep="\t", row.names=F)

The top 5 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
IL27-mediated signaling events 26 5 11.00 2.7e-08 6.3e-07 IL12B, TYK2, IL12RB2, TBX21, IL27
IL12-mediated signaling events 63 7 9.63 1.6e-08 6.3e-07 NFKB1, IL12B, TYK2, IL12RB2, IL2RA, NOS2, TBX21
NO2-dependent IL 12 Pathway in NK cells 17 4 11.00 1.1e-07 1.7e-06 IL12B, TYK2, IL12RB2, NOS2
IL23-mediated signaling events 37 5 9.08 2.5e-07 3.0e-06 NFKB1, IL12B, TYK2, NOS2, IL23R
Cytokine-cytokine receptor interaction 266 11 6.55 4.6e-07 4.3e-06 CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL2RA, CXCR2, LTBR, IL23R, IL1R2, CD27

This barplot displays the top 5 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- AS_pathways[1:5,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="SF", RData.location=RData.location)
AS_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 4 15.30 3.6e-09 4.3e-08 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 7.87 8.6e-06 5.2e-05 TNFRSF1A, CD27, LTBR
p53-like transcription factors 43 3 5.67 9.2e-05 3.7e-04 NFKB1, RUNX3, TBX21
Metalloproteases (zincins), catalytic domain 100 4 4.65 2.3e-04 6.3e-04 LNPEP, ERAP2, NPEPPS, ERAP1
Histone-fold 107 4 4.44 3.2e-04 6.3e-04 HIST1H4I, HIST1H3A, HIST1H3C, HIST1H3B
Immunoglobulin 481 9 3.95 3.1e-04 6.3e-04 TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B
Fibronectin type III 200 5 3.72 8.5e-04 1.5e-03 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 3.64 1.4e-03 2.1e-03 NFKB1, CARD9, TNFRSF1A

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="GOMF", RData.location=RData.location)
AS_gomf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 4 15.300 3.6e-09 4.3e-08 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 7.870 8.6e-06 5.2e-05 TNFRSF1A, CD27, LTBR
p53-like transcription factors 43 3 5.670 9.2e-05 3.7e-04 NFKB1, RUNX3, TBX21
Metalloproteases (zincins), catalytic domain 100 4 4.650 2.3e-04 6.3e-04 LNPEP, ERAP2, NPEPPS, ERAP1
Histone-fold 107 4 4.440 3.2e-04 6.3e-04 HIST1H4I, HIST1H3A, HIST1H3C, HIST1H3B
Immunoglobulin 481 9 3.950 3.1e-04 6.3e-04 TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B
Fibronectin type III 200 5 3.720 8.5e-04 1.5e-03 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 3.640 1.4e-03 2.1e-03 NFKB1, CARD9, TNFRSF1A
Homeodomain-like 294 3 1.080 8.1e-02 1.1e-01 NKX2-3, MIER1, SNAPC4
PH domain-like 420 3 0.442 2.1e-01 2.5e-01 TYK2, PLEKHG6, SH2B3

5.1.3 Network-level prioritisation

Network-level prioritisation is to identify a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. Given a gene network (the same as used in gene-level prioritisation) with nodes labelled with gene priority scores, the search for a maximum-scoring gene subnetwork is briefed as follows:

  1. score transformation, that is, given a threshold of tolerable priority scores, nodes above this threshold (nodes of interest) are scored positively, and negative scores for nodes below the threshold (intolerable).

  2. subnetwork identification, that is, to find an interconnected gene subnetwork enriched with positive-score nodes but allowing for a few negative-score nodes as linkers, done via heuristically solving prize-collecting Steiner tree problem (Fang and Gough 2014).

  3. controlling the subnetwork size, that is, an iterative procedure of scanning different priority thresholds is used to identify the network with a desired number of nodes/genes.

Notably, the preferential use of the same network as used in gene-level prioritisation is due to the fact that gene-level affinity/priority scores are smoothly distributed over the network after being walked. In other words, the chance of identifying such a gene network enriched with top prioritised genes is much higher. To reduce the runtime, by default only top 10% prioritised genes will be used to search for the maximum-scoring gene network.

# find maximum-scoring gene network with the desired node number=50
AS_net <- xPrioritiserSubnet(pNode=pAS, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- AS_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
IL12-mediated signaling events 63 11 19.10 1.1e-16 6.7e-15 NFKB1, LCK, B2M, IL12B, SOCS1, TYK2, IL12RB2, IL1R1, IL2RA, NOS2, TBX21
IL27-mediated signaling events 26 5 13.50 2.4e-09 7.0e-08 IL12B, TYK2, IL12RB2, TBX21, IL27
Cytokine-cytokine receptor interaction 266 11 8.52 4.2e-09 7.0e-08 CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL1R1, IL2RA, CXCR1, CXCR2, IL23R, IL1R2
Cytokine Signaling in Immune system 268 11 8.48 4.6e-09 7.0e-08 CSF2RB, ICAM1, LCK, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, IL1R2, UBA52
NO2-dependent IL 12 Pathway in NK cells 17 4 13.50 1.5e-08 1.8e-07 IL12B, TYK2, IL12RB2, NOS2
IL23-mediated signaling events 37 5 11.20 2.3e-08 2.4e-07 NFKB1, IL12B, TYK2, NOS2, IL23R
Signaling by Interleukins 106 7 8.95 4.3e-08 3.8e-07 CSF2RB, LCK, IL6R, TYK2, IL1R1, IL2RA, IL1R2
Jak-STAT signaling pathway 155 8 8.28 5.6e-08 4.2e-07 CSF2RB, IL12B, SOCS1, IL6R, TYK2, IL12RB2, IL2RA, IL23R
Hematopoietic cell lineage 87 6 8.48 2.1e-07 1.4e-06 ITGB3, IL6R, IL1R1, IL2RA, IL1R2, CD9
Immune System 912 16 5.66 7.6e-07 4.6e-06 CSF2RB, ICAM1, LCK, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, ICAM3, IL1R2, LNPEP, UBA52, ICAM4, ERAP1, NPEPPS

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
Neutral zinc metallopeptidase 182 4 3.56 0.00120 0.0033 ERAP2, LNPEP, NPEPPS, ERAP1
Cell surface 467 7 3.50 0.00081 0.0033 TNFRSF1A, B2M, SCNN1A, ICAM1, CD9, IL1R1, CXCR2
External side of plasma membrane 189 4 3.46 0.00140 0.0033 B2M, SCNN1A, ICAM1, CD9
Drug resistance 349 5 2.80 0.00410 0.0071 B2M, ICAM1, SOCS1, LCK, PDE4A
Protease 622 5 1.37 0.05600 0.0790 ERAP2, LNPEP, NPEPPS, ERAP1, CAPNS1

5.2 Spondyloarthritis

Import SA-associated GWAS lead SNPs (with the help of Anna Sanniti and Alicia Lledo Lara)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/Spondyloarthritis.txt"
SA <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

5.2.1 Gene-level prioritisation

pSA <- xPrioritiserSNPs(data=SA, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSA$priority, which can be saved into a file SA.genes_priority.txt:

SA_genes <- pSA$priority
write.table(SA_genes, file="SA.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
CAST 1 52.13154 0.02679 1
DDX39B 1 43.38974 0.02230 2
ATF6B 1 37.30371 0.01918 3
NFKBIL1 1 33.00396 0.01806 4
IER3 1 22.88007 0.01174 5
ERAP2 1 22.33534 0.01148 6
ERAP1 1 20.02777 0.01037 7
TRAF3IP2 1 20.03787 0.01030 8
MDC1 1 19.39800 0.00998 9
TNIP1 1 18.67828 0.00967 10
ANXA6 1 17.98198 0.00925 11
VARS2 1 17.75845 0.00913 12
LCE3D 1 14.03913 0.00903 13
LCE3A 1 13.93991 0.00849 14
LCE3B 1 13.99650 0.00818 15
LCE3C 1 12.22289 0.00802 16
LCE3E 1 12.45889 0.00751 17
TYK2 1 12.86998 0.00672 18
B3GNT2 1 12.76706 0.00657 19
CNPY2 1 12.65046 0.00652 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SA <- xPrioritiserManhattan(pSA, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SA)

5.2.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SA.pathways_priority.txt`
SA_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SA_pathways), SA_pathways)
write.table(output, file="SA.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
NO2-dependent IL 12 Pathway in NK cells 17 4 12.30 3.5e-08 1.0e-06 IL12B, TYK2, IL12RB2, NOS2
IL23-mediated signaling events 37 5 10.30 6.5e-08 1.0e-06 IL12B, TYK2, NOS2, IL23A, IL23R
IL27-mediated signaling events 26 4 9.84 3.6e-07 3.8e-06 STAT2, IL12B, TYK2, IL12RB2
IL12 and Stat4 Dependent Signaling Pathway in Th1 Development 23 3 7.79 9.0e-06 7.2e-05 IL12B, TYK2, IL12RB2
Beta2 integrin cell surface interactions 29 3 6.85 2.3e-05 1.5e-04 ICAM1, ICAM3, ICAM4
IL12-mediated signaling events 63 4 5.97 3.3e-05 1.5e-04 IL12B, TYK2, IL12RB2, NOS2
Jak-STAT signaling pathway 155 6 5.36 3.2e-05 1.5e-04 STAT2, IL12B, TYK2, IL12RB2, IL23A, IL23R
Allograft rejection 37 3 5.97 6.3e-05 2.2e-04 IL12B, HLA-E, HLA-DQA1
Viral myocarditis 71 4 5.55 5.8e-05 2.2e-04 FYN, ICAM1, HLA-E, HLA-DQA1
Type I diabetes mellitus 43 3 5.47 1.1e-04 3.7e-04 IL12B, HLA-E, HLA-DQA1

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SA_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="SF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 3 12.50 2.0e-07 1.8e-06 LNPEP, ERAP2, ERAP1
Metalloproteases (zincins), catalytic domain 100 3 3.75 1.2e-03 4.0e-03 LNPEP, ERAP2, ERAP1
SH2 domain 112 3 3.46 1.8e-03 4.0e-03 TYK2, STAT2, FYN
Immunoglobulin 481 7 3.26 1.6e-03 4.0e-03 HLA-DQA1, HLA-E, ICAM1, ICAM3, ICAM4, ICAM5, IL12B
Fibronectin type III 200 3 2.17 1.4e-02 2.5e-02 IL12B, IL12RB2, IL23R
WD40 repeat-like 281 3 1.51 4.1e-02 5.8e-02 KIF21B, PAN2, WDR83

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="GOMF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
aminopeptidase activity 28 3 7.67 1.1e-05 0.00019 ERAP2, LNPEP, ERAP1
integrin binding 105 4 4.83 1.8e-04 0.00160 ICAM1, ICAM3, ICAM4, ICAM5
ATP-dependent RNA helicase activity 63 3 4.81 2.7e-04 0.00160 DHX30, SKIV2L, DDX39B
RNA binding 404 5 2.13 1.5e-02 0.07000 RPL34, SETD1A, ILF3, SKIV2L, PUS10
receptor binding 321 4 1.92 2.2e-02 0.08000 HLA-E, ICAM3, NOS2, APOF
signal transducer activity 227 3 1.77 2.7e-02 0.08100 STAT2, SLC44A2, KCNH7
poly(A) RNA binding 1118 9 1.51 5.1e-02 0.13000 MRPL4, RAVER1, DHX30, SSRP1, XRCC6, ILF3, CAST, DDX39B, CS
nucleotide binding 304 3 1.22 6.6e-02 0.15000 RAVER1, REV3L, SETD1A
protein serine/threonine kinase activity 325 3 1.09 8.0e-02 0.16000 SQSTM1, CSNK2A1, BCKDK
ATP binding 1454 10 1.07 1.1e-01 0.19000 FYN, CSNK2A1, DHX30, XRCC6, SKIV2L, KIF21B, BCKDK, VARS2, DDX39B, TYK2

5.2.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SA_net <- xPrioritiserSubnet(pNode=pSA, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SA_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Beta2 integrin cell surface interactions 29 7 17.10 2.0e-12 1.9e-10 ICAM1, ITGAL, ITGAX, ITGAM, ICAM3, FCGR2A, ICAM4
Integrin cell surface interactions 79 8 11.50 3.8e-10 1.4e-08 ITGB1, ICAM1, ITGAL, ITGAX, ITGB3, ITGAM, ICAM3, ICAM4
Jak-STAT signaling pathway 155 10 9.93 6.0e-10 1.4e-08 IL4, STAT2, IL12B, IL13, IL6R, TYK2, IL12RB2, CTF1, IL23A, IL23R
Immune System 912 21 7.46 5.9e-10 1.4e-08 NFKBIA, ITGB1, RPS6KA1, STAT2, FYN, ICAM1, ITGAL, TNFAIP3, IL6R, TYK2, PSMA6, ICAM3, DDX58, RNF41, LNPEP, UBA52, REL, SQSTM1, ICAM4, ERAP1, NPEPPS
IL23-mediated signaling events 37 6 12.80 9.9e-10 1.9e-08 NFKBIA, IL12B, TYK2, NOS2, IL23A, IL23R
Monocyte and its Surface Molecules 11 4 15.90 2.0e-09 3.1e-08 ITGB1, ICAM1, ITGAL, ITGAM
IL27-mediated signaling events 26 5 12.80 4.7e-09 5.0e-08 STAT2, IL12B, TYK2, IL12RB2, TBX21
Integrin family cell surface interactions 26 5 12.80 4.7e-09 5.0e-08 ITGB1, ITGAL, ITGAX, ITGB3, ITGAM
Leishmania infection 72 7 10.50 4.8e-09 5.0e-08 NFKBIA, IL4, ITGB1, IL12B, ITGAM, NOS2, FCGR2A
NO2-dependent IL 12 Pathway in NK cells 17 4 12.70 2.6e-08 2.4e-07 IL12B, TYK2, IL12RB2, NOS2

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
External side of plasma membrane 189 7 6.75 1.2e-06 1.2e-05 SCNN1A, ITGAL, ITGAX, ITGB1, ICAM1, IL4, IL13
Cell surface 467 9 4.93 2.1e-05 1.1e-04 TNFRSF1A, SCNN1A, STX4, ITGAL, ITGAX, ITGB1, ICAM1, IL4, IL13
Tumor suppressor 716 9 3.40 7.7e-04 2.6e-03 ITGB1, TNFAIP3, UBA52, IL12B, CSNK2A1, PSMA6, ETS1, CDC37, RUNX3
Protease inhibitor 179 3 2.47 8.1e-03 1.7e-02 TNFAIP3, RPS6KA1, CAST
Neutral zinc metallopeptidase 182 3 2.44 8.6e-03 1.7e-02 LNPEP, NPEPPS, ERAP1

5.3 Systemic Lupus Erythematosus

Import SLE-associated GWAS lead SNPs

data.file <- "http://galahad.well.ox.ac.uk/bigdata/SLE.txt"
SLE <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

5.3.1 Gene-level prioritisation

pSLE <- xPrioritiserSNPs(data=SLE, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSLE$priority, which can be saved into a file SLE.genes_priority.txt:

SLE_genes <- pSLE$priority
write.table(SLE_genes, file="SLE.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
ATF6B 1 149.00804 0.04358 1
NELFE 1 79.37208 0.02335 2
VARS2 1 69.41208 0.02029 3
ITGAM 1 63.64914 0.01884 4
PBX2 1 60.75875 0.01783 5
PYCARD 1 59.68441 0.01753 6
CUEDC2 1 59.78891 0.01746 7
STAT4 1 57.81733 0.01707 8
HLA-DPB1 1 52.92738 0.01576 9
IRF5 1 52.36363 0.01540 10
TNPO3 1 52.36363 0.01533 11
CLIC1 1 52.07534 0.01527 12
PSMB9 1 50.63699 0.01482 13
RNF114 1 48.59792 0.01427 14
C2 1 46.16449 0.01354 15
LTA 1 42.43087 0.01303 16
SKIV2L 1 42.63385 0.01275 17
NCF2 1 41.58059 0.01219 18
HNRNPM 1 41.47880 0.01217 19
SMG7 1 38.82899 0.01137 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SLE <- xPrioritiserManhattan(pSLE, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SLE)

5.3.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SLE.pathways_priority.txt`
SLE_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SLE_pathways), SLE_pathways)
write.table(output, file="SLE.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Beta2 integrin cell surface interactions 29 5 10.50 5.0e-08 1.4e-06 ITGAX, ITGAM, ICAM3, FCGR2A, ITGAD
Leishmania infection 72 7 9.00 4.3e-08 1.4e-06 STAT1, NCF2, ITGAM, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A
Immune System 912 21 5.90 2.1e-07 3.9e-06 CD80, STAT1, TNFAIP3, C2, C4A, CFB, NCF2, TYK2, SKP1, UBE2L3, HLA-DOB, HLA-DPB1, HLA-DQA1, ICAM3, PYCARD, PSMB9, IFIH1, IRF5, BLK, IRF8, SQSTM1
Type I diabetes mellitus 43 5 8.42 5.9e-07 8.4e-06 CD80, LTA, HLA-DOB, HLA-DPB1, HLA-DQA1
Initial triggering of complement 13 3 9.47 1.8e-06 1.7e-05 C2, C4A, CFB
Regulation of Complement cascade 13 3 9.47 1.8e-06 1.7e-05 C2, C4A, CFB
IL22 Soluble Receptor Signaling Pathway 16 3 8.48 4.4e-06 3.6e-05 STAT1, STAT4, TYK2
Interferon gamma signaling 62 5 6.81 5.3e-06 3.8e-05 STAT1, HLA-DPB1, HLA-DQA1, IRF5, IRF8
Allograft rejection 37 4 7.22 6.4e-06 3.9e-05 CD80, HLA-DOB, HLA-DPB1, HLA-DQA1
Systemic lupus erythematosus 139 7 6.01 6.8e-06 3.9e-05 CD80, C2, C4A, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SLE_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="SF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
vWA-like 95 6 7.33 1.3e-06 2.0e-05 CFB, C2, ITGAD, ITGAM, ITGAX, XRCC6
Integrin alpha N-terminal domain 23 3 7.82 8.9e-06 6.3e-05 ITGAD, ITGAM, ITGAX
Integrin domains 25 3 7.47 1.3e-05 6.3e-05 ITGAD, ITGAM, ITGAX
MHC antigen-recognition domain 40 3 5.73 8.6e-05 3.2e-04 HLA-DOB, HLA-DPB1, HLA-DQA1
TNF-like 53 3 4.84 2.6e-04 7.8e-04 LTB, TNFSF4, LTA
SH2 domain 112 4 4.15 5.0e-04 1.3e-03 TYK2, STAT1, STAT4, BLK
RING/U-box 334 5 2.20 1.4e-02 2.9e-02 SQSTM1, TRIM72, PPIL2, RNF114, PHRF1
Immunoglobulin 481 6 1.93 2.3e-02 3.8e-02 HLA-DOB, HLA-DPB1, HLA-DQA1, ICAM3, CD80, FCGR2A

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="GOMF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
tumor necrosis factor receptor binding 30 4 9.30 6.7e-07 1.5e-05 STAT1, LTA, LTB, TNFSF4
ATP-dependent RNA helicase activity 63 3 4.43 4.5e-04 5.2e-03 DHX30, SKIV2L, DDX39B
receptor binding 321 6 3.13 2.3e-03 1.8e-02 HSPA1A, ICAM3, LTA, BLK, LTB, TNFSF4
actin filament binding 113 3 2.96 3.9e-03 2.3e-02 AIF1, ARPC5, FLNC
sequence-specific DNA binding transcription factor activity 825 10 2.53 6.9e-03 3.2e-02 IKZF1, PTTG1, MECP2, ETS1, ATF6B, PBX2, PHTF1, STAT1, STAT4, IRF5
cytokine activity 172 3 2.06 1.7e-02 5.9e-02 LTA, LTB, TNFSF4
transcription factor binding 266 4 2.04 1.8e-02 5.9e-02 MECP2, ETS1, PBX2, GPX3
ligase activity 283 4 1.91 2.3e-02 6.5e-02 RNF114, UBE2L3, PPIL2, TNFAIP3
sequence-specific DNA binding 407 5 1.79 2.8e-02 6.6e-02 IKZF1, ETS1, ATF6B, PBX2, STAT4
ATP binding 1454 13 1.74 3.4e-02 6.6e-02 UBE2L3, IFIH1, HSPA1A, DHX30, BLK, XRCC6, SKIV2L, RENBP, VARS2, DDX39B, NADSYN1, NMNAT2, TYK2

5.3.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SLE_net <- xPrioritiserSubnet(pNode=pSLE, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SLE_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Immune System 912 22 8.63 4.5e-12 4.9e-10 CD80, HRAS, STAT1, FCGR2B, CSK, CDKN1B, NCF2, SOCS1, IRAK1, TYK2, CD44, SKP1, RASGRP3, IRF7, PYCARD, FCGR3A, RASGRP1, UBA52, IRF5, BLK, IRF8, SQSTM1
Leishmania infection 72 8 12.80 5.7e-11 3.1e-09 IL10, STAT1, TNF, NCF2, ITGAM, IRAK1, FCGR2A, FCGR3A
Cytokine Signaling in Immune system 268 12 9.36 3.4e-10 1.2e-08 HRAS, STAT1, SOCS1, IRAK1, TYK2, CD44, SKP1, IRF7, UBA52, IRF5, IRF8, SQSTM1
IL-10 Anti-inflammatory Signaling Pathway 17 4 13.50 1.5e-08 4.1e-07 IL10, STAT1, STAT4, TNF
Interferon gamma signaling 62 6 10.30 1.9e-08 4.3e-07 STAT1, SOCS1, CD44, IRF7, IRF5, IRF8
Interferon alpha/beta signaling 64 6 10.10 2.4e-08 4.5e-07 STAT1, SOCS1, TYK2, IRF7, IRF5, IRF8
Interferon Signaling 158 8 8.19 6.6e-08 1.0e-06 STAT1, SOCS1, TYK2, CD44, IRF7, UBA52, IRF5, IRF8
IL27-mediated signaling events 26 4 10.80 1.5e-07 1.7e-06 STAT1, STAT4, TNF, TYK2
Signaling by the B Cell Receptor (BCR) 123 7 8.20 1.4e-07 1.7e-06 HRAS, CDKN1B, SKP1, RASGRP3, RASGRP1, UBA52, BLK
Adaptive Immune System 524 13 6.61 1.4e-07 1.7e-06 CD80, HRAS, FCGR2B, CSK, CDKN1B, NCF2, SOCS1, SKP1, RASGRP3, FCGR3A, RASGRP1, UBA52, BLK

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
Clinically actionable 237 6 4.30 0.00020 0.0017 CDKN1B, SOCS1, TP53, HRAS, PRDM1, STAT4
Histone modification 249 6 4.14 0.00028 0.0017 TP53, CDK9, HCFC1, KAT8, MECP2, SKP1
External side of plasma membrane 189 5 4.04 0.00042 0.0017 TNF, ITGAX, CD80, FCGR3A, CD44
Drug resistance 349 6 3.13 0.00210 0.0062 TNF, SOCS1, TP53, IL10, GPX3, NCF2
Tumor suppressor 716 9 2.82 0.00320 0.0076 TNF, CDKN1B, TP53, HRAS, CDK9, UBA52, HCFC1, ETS1, CDC37

5.4 Dupuytren’s disease (DD)

Import DD-associated GWAS lead SNPs

data.file <- "http://galahad.well.ox.ac.uk/bigdata/Dupuytren.txt"
DD <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

The 13 DD-associated SNPs are:

SNPs Pvalue
rs16879765 6e-39
rs6519955 3e-33
rs611744 8e-15
rs11672517 7e-14
rs2912522 2e-13
rs8124695 8e-10
rs7524102 3e-09
rs10809650 6e-09
rs4730775 3e-08
rs2179367 3e-07
rs4789939 6e-07
rs4932194 8e-07
rs12032381 6e-06

5.4.1 Gene-level prioritisation

pDD <- xPrioritiserSNPs(data=DD, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pDD$priority, which can be saved into a file DD.genes_priority.txt:

DD_genes <- pDD$priority
write.table(DD_genes, file="DD.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
PRR34 1 11.96220 0.11660 1
SFRP4 1 12.43433 0.11429 2
NME8 1 10.15061 0.09302 3
WNT7B 1 8.90219 0.08214 4
EIF3E 1 4.89795 0.04480 5
DUXA 1 4.42695 0.04138 6
RSPO2 1 4.50962 0.04136 7
ZIM3 1 3.52149 0.03572 8
USP29 1 3.01678 0.03165 9
BPIFC 0 0.00000 0.02915 10
PPARA 1 2.79592 0.02560 11
TIMP2 1 2.23830 0.02052 12
AURKC 1 2.04431 0.01872 13
WNT2 1 1.61094 0.01569 14
ZC3H12D 1 1.04283 0.00953 15
NUP43 1 0.96004 0.00878 16
ZBTB40 1 0.91339 0.00849 17
PCMT1 1 0.91664 0.00838 18
ZNF460 1 0.82780 0.00758 19
TAB2 1 0.80465 0.00737 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_DD <- xPrioritiserManhattan(pDD, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_DD)

5.4.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `DD.pathways_priority.txt`
DD_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(DD_pathways), DD_pathways)
write.table(output, file="DD.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Basal cell carcinoma 55 29 47.4 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Melanogenesis 101 29 34.7 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Wnt signaling pathway 151 32 31.1 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK2, DKK1
Class B/2 (Secretin family receptors) 87 27 34.8 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes related to Wnt-mediated signal transduction 89 26 33.1 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT11, WNT2B, FZD4, WNT9A, FZD6, FZD7, FZD2, FZD8, WNT10A, WNT3A, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK1
Wnt signaling network 28 17 38.9 0.0e+00 0.0e+00 WNT1, FZD1, WNT2, WNT3, WNT5A, WNT7A, WNT7B, FZD4, FZD6, FZD7, FZD2, FZD8, FZD9, WNT3A, FZD10, FZD5, DKK1
Hedgehog signaling pathway 56 19 30.5 0.0e+00 0.0e+00 WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B
Pathways in cancer 325 29 18.5 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
GPCR ligand binding 400 27 15.2 0.0e+00 0.0e+00 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes encoding secreted soluble factors 344 21 12.6 1.4e-16 3.2e-16 WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B, SFRP4, MEGF6

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- DD_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="SF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Frizzled cysteine-rich domain 20 11 34.00 0.0e+00 0.0e+00 SFRP4, FZD1, FZD4, FZD6, FZD7, FZD8, FZD10, FZD9, FZD3, FZD5, FZD2
Thioredoxin-like 127 12 14.10 6.8e-14 3.1e-13 TXNL1, TXNRD1, TXNRD3, PRDX2, PRDX1, NME8, PRDX3, TXNDC2, PRDX5, TXNDC8, TXN2, PRDX4
FAD/NAD(P)-binding domain 54 3 5.17 1.7e-04 5.1e-04 TXNRD1, TXNRD3, TXNRD2
Growth factor receptor domain 127 4 4.15 5.0e-04 1.1e-03 RSPO4, MEGF6, RSPO3, RSPO2
Cysteine proteinases 149 3 2.56 7.4e-03 1.3e-02 USP5, USP29, USP36
Homeodomain-like 294 4 2.04 1.8e-02 2.7e-02 DUXA, ARGFX, TPRX1, LEUTX

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="GOMF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
frizzled binding 37 23 48.10 0.0e+00 0.0e+00 WNT5A, ZNRF3, WNT3A, WNT1, WNT3, RSPO3, WNT7A, WNT5B, FZD1, FZD7, WNT4, WNT11, WNT2, WNT8A, WNT10B, WNT6, WNT7B, WNT8B, WNT10A, WNT2B, WNT9A, WNT9B, WNT16
Wnt-activated receptor activity 21 11 30.50 0.0e+00 0.0e+00 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10
Wnt-protein binding 28 11 26.30 4.0e-20 3.3e-19 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10
receptor agonist activity 15 9 29.50 1.3e-19 8.8e-19 WNT5A, WNT3A, WNT1, WNT3, WNT7A, WNT4, WNT2, WNT8A, WNT10B
protein disulfide oxidoreductase activity 23 6 15.70 5.6e-11 3.2e-10 TXNRD1, TXNRD3, TXN2, TXNDC2, TXNDC8, TXNL1
PDZ domain binding 86 6 7.60 9.1e-07 4.2e-06 FZD4, FZD1, FZD8, FZD2, FZD7, FZD3
G-protein coupled receptor activity 643 12 4.18 1.4e-04 5.4e-04 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10, LGR6
G-protein coupled receptor binding 45 3 5.22 1.6e-04 5.6e-04 RSPO3, RSPO2, RSPO4
flavin adenine dinucleotide binding 62 3 4.28 5.5e-04 1.7e-03 TXNRD1, TXNRD3, TXNRD2
microtubule motor activity 68 3 4.03 7.9e-04 2.0e-03 DNAH11, DNAH5, DNAI2

5.4.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=20
DD_net <- xPrioritiserSubnet(pNode=pDD, priority.quantite=0.1, subnet.size=21, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- DD_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Basal cell carcinoma 55 9 25.80 3.5e-18 4.9e-17 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Class B/2 (Secretin family receptors) 87 9 20.40 4.7e-16 2.8e-15 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Genes related to Wnt-mediated signal transduction 89 9 20.20 5.9e-16 2.8e-15 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, SFRP4
Melanogenesis 101 9 18.90 2.2e-15 6.2e-15 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Wnt signaling pathway 151 10 17.10 1.8e-15 6.2e-15 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, SFRP4
Wnt signaling network 28 6 24.20 7.4e-14 1.7e-13 WNT2, WNT7B, FZD4, FZD7, FZD8, FZD10
Pathways in cancer 325 11 12.50 2.2e-13 4.4e-13 EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, MMP9
Hedgehog signaling pathway 56 5 14.10 1.3e-09 2.3e-09 WNT2, WNT6, WNT7B, WNT11, WNT10A
GPCR ligand binding 400 9 8.95 2.2e-09 3.5e-09 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Signaling by GPCR 906 10 6.06 4.7e-07 6.5e-07 EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 3 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
Cell surface 467 6 5.0 2.9e-05 5.8e-05 SFRP4, FZD10, WNT6, FZD4, RSPO2, TIMP2
G protein coupled receptor 855 4 1.6 3.4e-02 3.4e-02 FZD10, FZD4, FZD7, FZD8
NA NA NA NA NA NA NA

6 Ontology annotations & TPN nominations

In this section, we first describe annotation predictors using ontologies and target genes nominated by ULTRA-DD TPN members, and then integrate them as an additional support for disease-specific genetic prioritisation by Pi.

6.1 Annotation predictors

Using ontologies to annotate genes is one of the most effective ways to reuse existing knowledge. Also it is a scalable way to capture a particular knowledge sphere in an systematic way. An ontology is like a dictionary, containing vocabularies (called terms) and their relations. These terms and relations are understandable by humans, and readable by computers. According to how to organise relationships between terms, there are types of ontologies:

  1. structured ontology: terms are organised in a tree-like structure, such as Gene Ontology (Ashburner et al. 2000), Disease Ontology (Schriml et al. 2012), and Phenotype Ontologies in human and mouse (Köhler et al. 2013; C. L. Smith and Eppig 2009);
  2. non-structured ontology: terms are not organised as tree but simply listed as keywords, such as a collection of pathways, and gene druggable categories.

When integrating ontologies of different knowledge spheres, the use is limited by the heterogeneous nature: different terms/keywords represent the same or similar knowledge. For this reason, they are better used in a broader context to capture a general category of knowledge, eg a group of diseases instead of a specific disease. We consider the context broadly relevant to immune-related diseases. We choose 5 annotation predictors as informative carriers generalising their relatedness to immunity/inflammation diseases. For a given gene, we look at:

  1. OMIM evidence: whether it was identified in OMIM as an immune disease ontology gene;
  2. Phenotype evidence: whether it is annotated to immune-related phenotypes from Human Phenotype Ontology (HPO) and/or Mouse Phenotype Ontology (MPO);
  3. Function evidence: whether it is annotated to immune/inflammatory terms from Gene Ontology;
  4. Pathway evidence: whether it is annotated to immune-related pathways (from KEGG, BioCarta and Reactome);
  5. Druggable evidence: whether it is annotated to gene druggable categories from DGIdb.

Since the above predictors are relatively independent to each other, additive scoring scheme is used to calculate annotation scores per gene.

Import annotation predictors (pre-computed)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/iAnno.txt"
iAnno <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
# gene/target info
iSymbol <- iAnno[,2]
iGenes <- iAnno[,1:3]
# predictor info (including individual scores)
iPS <- iAnno[,22:26]
# additive predictor score
PS <- apply(iPS, 1, sum)
# combine additive predictor score and individual acores
iPS <- cbind(PS=PS, iPS)

The first 5 rows of the data frame iGenes are:

Target.GeneID Target.Symbol Target.Name
1 A1BG alpha-1-B glycoprotein
2 A2M alpha-2-macroglobulin
3 A2MP1 alpha-2-macroglobulin pseudogene 1
9 NAT1 N-acetyltransferase 1 (arylamine N-acetyltransferase)
10 NAT2 N-acetyltransferase 2 (arylamine N-acetyltransferase)

Their predictor info is stored in the data frame iPS, with the column PS for additive predictor scores:

PS OMIM Phenotype Function Pathway Druggable
0 0 0 0 0 0
1 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0

Distributions of annotation predictor scores

df <- iPS
p <- ggplot(df, aes(PS))
p + geom_bar(fill="deepskyblue") + scale_y_log10() + geom_text(stat="bin",binwidth=1,aes(label=..count..),vjust=0,color="blue") + ylab("Number of genes (predicted)") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + scale_x_continuous(breaks=0:max(df$PS)) + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue"))

Pairwise correlations between predictors

df <- df[df$PS>0, -1]
res <- supraHex::sDistance(t(df), metric="pearson")
diag(res) <- 1
cellnote <- signif(1-res, digits=2)
diag(cellnote) <- NA
visHeatmapAdv(1-res, Rowv=F, Colv=F, colormap="bwr", zlim=c(-1,1), key=F, cexRow=1, cexCol=1, cellnote=cellnote, notecex=0.8, notecol="black", lhei=c(1,5), lwid=c(1,5))

6.2 TPN nominations

# TPN nomination info
iTPN <- iAnno[,9:15]
# flag whether nominated by ULTRA-DD TPN members
iNominated <- iAnno[,9]
# info provided by SGC Oxford
iSGC <- iAnno[,4:8]
# iSGC restricted to nominated targets
df <- iSGC[iNominated>0, ]
df$SGC_PI <- !(df$SGC_PI %in% c("Unassigned","Unassigned - TO",""))

Gene family analysis

Below show results of family analysis (provided by Brian and David).

The top family is integral membrane proteins (IMP), followed by histone proteins (histone lysine demethylase; KDM) and ubiquitin-related proteins. Other top families include DENN domain proteins.

Allocations to SGC PIs

As seen below, 714 genes with 490 (68%) have been allocation to specific SGC-Oxford PIs.

SGC PI allocations are conditioned on SG/ChemBio Tractability:

Generally speaking, targets with good tractability are more likely to be allocated to SGC PIs.

Distributions of annotation predictor scores for nominated genes

Distributions of annotation predictor scores are conditioned on whether genes are nominated or not:

It can be seen that nominated genes tend to receive a higher annotation scores than non-nominated genes do.

6.3 Integration for Ankylosing Spondylitis

Recall: AS genes prioritised by Pi are stored in the data frame AS_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(AS_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine AS_genes iPS iGenes iNominated
AS_combined <- cbind(AS_genes[,c(1,4)], genes_ap)

The top 20 AS-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
CAST 0.08460 1 0 0 0 0 1 0
ERAP2 0.03626 0 0 0 0 0 0 0
ERAP1 0.03276 2 0 0 1 1 0 0
B3GNT2 0.02077 0 0 0 0 0 0 0
IL23R 0.01823 3 1 1 1 0 0 0
LNPEP 0.01696 1 0 0 0 1 0 0
DDX39B 0.01284 1 0 0 0 0 1 0
IL12RB2 0.01197 1 0 1 0 0 0 0
ETS2 0.01160 0 0 0 0 0 0 0
PSMG1 0.01091 0 0 0 0 0 0 0
TBKBP1 0.01071 0 0 0 0 0 0 0
NFKBIL1 0.01039 1 1 0 0 0 0 0
KIF21B 0.01008 0 0 0 0 0 0 0
GPR65 0.00969 1 0 0 1 0 0 1
C1orf106 0.00854 0 0 0 0 0 0 0
IL6R 0.00747 5 1 1 1 1 1 0
TNFRSF1A 0.00717 4 1 1 1 0 1 1
NPEPPS 0.00670 2 0 0 0 1 1 0
KPNB1 0.00666 1 0 0 0 1 0 0
GALC 0.00654 1 0 1 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving AS-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(AS_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(AS_combined$priority,1) # genetic priority at the top
p <- ggplot(AS_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("AS-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 43 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 126 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone. The origin of the wording a2maid is explained here.1
  • Top-left: 586 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to AS (maybe other immune diseases).

The 43 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
IL6R interleukin 6 receptor 0.00747 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00717 4 1
CARD9 caspase recruitment domain family, member 9 0.00608 4 0
TYK2 tyrosine kinase 2 0.00396 4 0
IFIH1 interferon induced with helicase C domain 1 0.00390 4 0
CD27 CD27 molecule 0.00331 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00277 4 0
IL12B interleukin 12B 0.00255 5 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00252 4 0
IL2RA interleukin 2 receptor, alpha 0.00250 5 1
ICAM1 intercellular adhesion molecule 1 0.00183 4 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00166 4 0
AIRE autoimmune regulator 0.00096 4 0
ADAR adenosine deaminase, RNA-specific 0.00090 4 1
HLA-B major histocompatibility complex, class I, B 0.00075 4 0
HLA-A major histocompatibility complex, class I, A 0.00067 5 0
NOTCH1 notch 1 0.00064 4 0
B2M beta-2-microglobulin 0.00061 5 0
IFNG interferon, gamma 0.00023 4 0
STAT4 signal transducer and activator of transcription 4 0.00021 4 0
IL12A interleukin 12A 0.00021 4 0
JAK2 Janus kinase 2 0.00019 5 0
IL1B interleukin 1, beta 0.00015 4 0
IL2 interleukin 2 0.00014 4 0
IL10 interleukin 10 0.00014 5 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00014 5 0
JAK3 Janus kinase 3 0.00014 4 0
MALT1 MALT1 paracaspase 0.00013 4 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00013 4 0
IL1RN interleukin 1 receptor antagonist 0.00012 5 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00011 5 0
PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) 0.00011 5 0
CBL Cbl proto-oncogene, E3 ubiquitin protein ligase 0.00011 4 0
PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha 0.00011 4 1
IL6 interleukin 6 0.00011 5 0
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00011 5 0
CD3E CD3e molecule, epsilon (CD3-TCR complex) 0.00010 5 0
BCL10 B-cell CLL/lymphoma 10 0.00009 4 0
IL7R interleukin 7 receptor 0.00008 5 0
AKT1 v-akt murine thymoma viral oncogene homolog 1 0.00008 5 0
FASLG Fas ligand (TNF superfamily, member 6) 0.00008 4 0
FOS FBJ murine osteosarcoma viral oncogene homolog 0.00008 4 0
RAF1 Raf-1 proto-oncogene, serine/threonine kinase 0.00008 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ERAP1 endoplasmic reticulum aminopeptidase 1 0.03276 2 0
IL23R interleukin 23 receptor 0.01823 3 0
NPEPPS aminopeptidase puromycin sensitive 0.00670 2 0
CACNA1S calcium channel, voltage-dependent, L type, alpha 1S subunit 0.00605 2 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 0.00557 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00543 2 0
FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32) 0.00450 3 0
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 0.00401 3 0
TBX21 T-box 21 0.00395 2 0
NOS2 nitric oxide synthase 2, inducible 0.00388 3 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
CAST calpastatin 0.08460 1 0
ERAP2 endoplasmic reticulum aminopeptidase 2 0.03626 0 0
B3GNT2 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 0.02077 0 0
LNPEP leucyl/cystinyl aminopeptidase 0.01696 1 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.01284 1 0

Label the gene network with annotation prediction

The nodes/genes in AS-specific gene network are color-coded by annotation additive predictor scores.

g <- AS_net
evi <- AS_combined[,3]
names(evi) <- rownames(AS_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- AS_net
evi <- AS_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.4 Integration for Spondyloarthritis

Recall: SA genes prioritised by Pi are stored in the data frame SA_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SA_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SA_genes iPS iGenes iNominated
SA_combined <- cbind(SA_genes[,c(1,4)], genes_ap)

The top 20 SA-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
CAST 0.02679 1 0 0 0 0 1 0
DDX39B 0.02230 1 0 0 0 0 1 0
ATF6B 0.01918 0 0 0 0 0 0 0
NFKBIL1 0.01806 1 1 0 0 0 0 0
IER3 0.01174 0 0 0 0 0 0 0
ERAP2 0.01148 0 0 0 0 0 0 0
ERAP1 0.01037 2 0 0 1 1 0 0
TRAF3IP2 0.01030 1 0 1 0 0 0 0
MDC1 0.00998 0 0 0 0 0 0 0
TNIP1 0.00967 0 0 0 0 0 0 0
ANXA6 0.00925 0 0 0 0 0 0 0
VARS2 0.00913 0 0 0 0 0 0 1
LCE3D 0.00903 0 0 0 0 0 0 0
LCE3A 0.00849 0 0 0 0 0 0 0
LCE3B 0.00818 0 0 0 0 0 0 0
LCE3C 0.00802 0 0 0 0 0 0 0
LCE3E 0.00751 0 0 0 0 0 0 0
TYK2 0.00672 4 1 1 0 1 1 0
B3GNT2 0.00657 0 0 0 0 0 0 0
CNPY2 0.00652 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SA-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SA_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SA_combined$priority,1) # genetic priority at the top
p <- ggplot(SA_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("Spondyloarthritis-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 49 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 102 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 604 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to Spondyloarthritis (maybe other immune diseases).

The 49 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
TYK2 tyrosine kinase 2 0.00672 4 0
IL12B interleukin 12B 0.00518 5 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00392 4 0
ICAM1 intercellular adhesion molecule 1 0.00390 4 0
IL6R interleukin 6 receptor 0.00243 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00233 4 1
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00217 5 0
DDX58 DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 0.00215 4 0
CARD9 caspase recruitment domain family, member 9 0.00194 4 0
IFIH1 interferon induced with helicase C domain 1 0.00114 4 0
CD27 CD27 molecule 0.00105 4 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00094 4 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00091 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00089 4 0
IL2RA interleukin 2 receptor, alpha 0.00085 5 1
C4A complement component 4A (Rodgers blood group) 0.00056 4 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00055 4 0
TNF tumor necrosis factor 0.00044 4 0
CFB complement factor B 0.00042 5 0
AIRE autoimmune regulator 0.00034 4 0
MAPK1 mitogen-activated protein kinase 1 0.00031 4 0
ADAR adenosine deaminase, RNA-specific 0.00031 4 1
PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 0.00030 5 0
HLA-B major histocompatibility complex, class I, B 0.00027 4 0
HLA-A major histocompatibility complex, class I, A 0.00025 5 0
B2M beta-2-microglobulin 0.00024 5 0
NOTCH1 notch 1 0.00022 4 0
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00022 5 0
STAT5B signal transducer and activator of transcription 5B 0.00019 4 0
PTPN11 protein tyrosine phosphatase, non-receptor type 11 0.00019 4 0
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00017 4 0
CHUK conserved helix-loop-helix ubiquitous kinase 0.00015 4 0
IL12A interleukin 12A 0.00014 4 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00013 5 0
IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 0.00013 5 0
IFNG interferon, gamma 0.00013 4 0
STAT4 signal transducer and activator of transcription 4 0.00012 4 0
JAK2 Janus kinase 2 0.00012 5 0
MALT1 MALT1 paracaspase 0.00012 4 0
CD40 CD40 molecule, TNF receptor superfamily member 5 0.00012 5 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00011 5 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00011 4 0
IL2 interleukin 2 0.00011 4 0
IL1B interleukin 1, beta 0.00011 4 0
RAG1 recombination activating gene 1 0.00010 4 0
PRKDC protein kinase, DNA-activated, catalytic polypeptide 0.00010 4 1
EGFR epidermal growth factor receptor 0.00009 4 0
FOS FBJ murine osteosarcoma viral oncogene homolog 0.00009 4 0
JAK3 Janus kinase 3 0.00009 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ERAP1 endoplasmic reticulum aminopeptidase 1 0.01037 2 0
ICAM3 intercellular adhesion molecule 3 0.00594 2 0
IL23R interleukin 23 receptor 0.00584 3 0
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0.00537 2 1
FYN FYN proto-oncogene, Src family tyrosine kinase 0.00450 3 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00409 2 0
HLA-E major histocompatibility complex, class I, E 0.00344 2 0
NOS2 nitric oxide synthase 2, inducible 0.00296 3 0
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0.00243 3 1
NPEPPS aminopeptidase puromycin sensitive 0.00213 2 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
CAST calpastatin 0.02679 1 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.02230 1 0
ATF6B activating transcription factor 6 beta 0.01918 0 0
NFKBIL1 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 0.01806 1 0
IER3 immediate early response 3 0.01174 0 0

Label the gene network with annotation prediction

The nodes/genes in Spondyloarthritis-specific gene network are color-coded by annotation additive predictor scores.

g <- SA_net
evi <- SA_combined[,3]
names(evi) <- rownames(SA_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SA_net
evi <- SA_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.5 Integration for Systemic Lupus Erythematosus

Recall: SLE genes prioritised by Pi are stored in the data frame SLE_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SLE_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SLE_genes iPS iGenes iNominated
SLE_combined <- cbind(SLE_genes[,c(1,4)], genes_ap)

The top 20 SLE-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
ATF6B 0.04358 0 0 0 0 0 0 0
NELFE 0.02335 0 0 0 0 0 0 0
VARS2 0.02029 0 0 0 0 0 0 1
ITGAM 0.01884 3 1 1 1 0 0 1
PBX2 0.01783 0 0 0 0 0 0 0
PYCARD 0.01753 2 0 0 1 1 0 0
CUEDC2 0.01746 0 0 0 0 0 0 0
STAT4 0.01707 4 1 1 0 1 1 0
HLA-DPB1 0.01576 2 1 0 0 1 0 0
IRF5 0.01540 3 1 1 0 1 0 1
TNPO3 0.01533 1 1 0 0 0 0 0
CLIC1 0.01527 0 0 0 0 0 0 0
PSMB9 0.01482 3 0 0 1 1 1 0
RNF114 0.01427 0 0 0 0 0 0 0
C2 0.01354 2 0 0 1 1 0 0
LTA 0.01303 3 1 0 0 1 1 0
SKIV2L 0.01275 0 0 0 0 0 0 0
NCF2 0.01219 3 1 0 1 1 0 0
HNRNPM 0.01217 0 0 0 0 0 0 0
SMG7 0.01137 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SLE-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SLE_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SLE_combined$priority,1) # genetic priority at the top
p <- ggplot(SLE_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("SLE-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 47 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 108 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 600 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to SLE (maybe other immune diseases).

The 47 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
STAT4 signal transducer and activator of transcription 4 0.01707 4 0
C4A complement component 4A (Rodgers blood group) 0.00709 4 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00570 4 0
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 0.00365 4 2
TYK2 tyrosine kinase 2 0.00217 4 0
CFB complement factor B 0.00208 5 0
IRF8 interferon regulatory factor 8 0.00172 4 0
IFIH1 interferon induced with helicase C domain 1 0.00157 4 0
IRF7 interferon regulatory factor 7 0.00139 4 0
CDKN1B cyclin-dependent kinase inhibitor 1B (p27, Kip1) 0.00123 4 0
TNF tumor necrosis factor 0.00113 4 0
HRAS Harvey rat sarcoma viral oncogene homolog 0.00107 4 0
IL10 interleukin 10 0.00093 5 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00084 4 0
PNP purine nucleoside phosphorylase 0.00081 4 0
IL12A interleukin 12A 0.00071 4 0
ICAM1 intercellular adhesion molecule 1 0.00065 4 0
MAPK1 mitogen-activated protein kinase 1 0.00042 4 0
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00030 4 0
CHUK conserved helix-loop-helix ubiquitous kinase 0.00028 4 0
CARD9 caspase recruitment domain family, member 9 0.00026 4 0
MASP2 mannan-binding lectin serine peptidase 2 0.00026 5 0
CIITA class II, major histocompatibility complex, transactivator 0.00024 4 1
C1QA complement component 1, q subcomponent, A chain 0.00021 4 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00020 4 0
BCL2 B-cell CLL/lymphoma 2 0.00020 5 0
CYBB cytochrome b-245, beta polypeptide 0.00020 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00019 4 1
PSTPIP1 proline-serine-threonine phosphatase interacting protein 1 0.00019 4 0
MEFV Mediterranean fever 0.00018 4 0
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00017 5 0
NLRP3 NLR family, pyrin domain containing 3 0.00016 4 1
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00014 4 0
MYD88 myeloid differentiation primary response 88 0.00014 4 0
IL12B interleukin 12B 0.00013 5 0
IL1B interleukin 1, beta 0.00013 4 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00013 5 0
PDGFRA platelet-derived growth factor receptor, alpha polypeptide 0.00013 4 0
CREBBP CREB binding protein 0.00012 4 1
C3 complement component 3 0.00012 5 0
PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 0.00012 5 0
IL2RA interleukin 2 receptor, alpha 0.00012 5 1
PRKDC protein kinase, DNA-activated, catalytic polypeptide 0.00012 4 1
IFNG interferon, gamma 0.00011 4 0
IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 0.00011 5 0
ITK IL2-inducible T-cell kinase 0.00010 5 0
TLR2 toll-like receptor 2 0.00010 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0.01884 3 1
PYCARD PYD and CARD domain containing 0.01753 2 0
HLA-DPB1 major histocompatibility complex, class II, DP beta 1 0.01576 2 0
IRF5 interferon regulatory factor 5 0.01540 3 1
PSMB9 proteasome (prosome, macropain) subunit, beta type, 9 0.01482 3 0
C2 complement component 2 0.01354 2 0
LTA lymphotoxin alpha 0.01303 3 0
NCF2 neutrophil cytosolic factor 2 0.01219 3 0
STAT1 signal transducer and activator of transcription 1, 91kDa 0.00620 3 0
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0.00352 2 1

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ATF6B activating transcription factor 6 beta 0.04358 0 0
NELFE negative elongation factor complex member E 0.02335 0 0
VARS2 valyl-tRNA synthetase 2, mitochondrial 0.02029 0 1
PBX2 pre-B-cell leukemia homeobox 2 0.01783 0 0
CUEDC2 CUE domain containing 2 0.01746 0 0

Label the gene network with annotation prediction

The nodes/genes in SLE-specific gene network are color-coded by annotation additive predictor scores.

g <- SLE_net
evi <- SLE_combined[,3]
names(evi) <- rownames(SLE_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SLE_net
evi <- SLE_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

7 Cross-disease comparisons

7.1 Gene-level comparisons

Recall: disease-centric genetic prioritisation for genes by Pi are stored respectively in three data frames, that is, AS_genes for Ankylosing Spondylitis, SA_genes for Spondyloarthritis and SLE_genes for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, AS_genes$name)
df_AS <- AS_genes[ind,]
# Spondyloarthritis (SA)
ind <- match(iSymbol, SA_genes$name)
df_SA <- SA_genes[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, SLE_genes$name)
df_SLE <- SLE_genes[ind,]

# Combine into a data frame 'df_rank' and `df_priority`
## rank
df_rank <- cbind(AS=df_AS$rank, SA=df_SA$rank, SLE=df_SLE$rank)
## priority
df_priority <- cbind(AS=df_AS$priority, SA=df_SA$priority, SLE=df_SLE$priority)
rownames(df_rank) <- rownames(df_priority) <- iSymbol

Venn diagram of the top 200 genes across diseases

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.15)
grid.draw(vp)

The 14 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
IFIH1 interferon induced with helicase C domain 1 4 0
TYK2 tyrosine kinase 2 4 0
FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32) 3 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
ICAM3 intercellular adhesion molecule 3 2 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 1 0
LTBR lymphotoxin beta receptor (TNFR superfamily, member 3) 1 0
NFKBIL1 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 1 0
CDC37 cell division cycle 37 0 0
FDX1L ferredoxin 1-like 0 0
HSPA6 heat shock 70kDa protein 6 (HSP70B’) 0 0
RAVER1 ribonucleoprotein, PTB-binding 1 0 0
RLTPR RGD motif, leucine rich repeats, tropomodulin domain and proline-rich containing 0 0

Venn diagram of the top 200 genes in any diseases vs 714 TPN nominated genes

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
data$Prioritised <- base::Reduce(union, data[1:3])
data$Nominated <- iSymbol[which(iNominated>0)]
category.names <- c('Prioritised', 'Nominated')
vp <- venn.diagram(x=data[4:5], filename=NULL, fill=c("red","orange"), category.names=category.names, margin=0.15, cat.pos=1)
grid.draw(vp)

The 30 genes common to disease-specific genetic prioritisations and TPN nominations (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
PXK PX domain containing serine/threonine kinase 1 4
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 4 2
IL2RA interleukin 2 receptor, alpha 5 1
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
BLK BLK proto-oncogene, Src family tyrosine kinase 3 1
IRF5 interferon regulatory factor 5 3 1
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 2 1
BANK1 B-cell scaffold protein with ankyrin repeats 1 1 1
GPR65 G protein-coupled receptor 65 1 1
IL23A interleukin 23, alpha subunit p19 1 1
KAT8 K(lysine) acetyltransferase 8 1 1
SCNN1A sodium channel, non voltage gated 1 alpha subunit 1 1
SLC11A1 solute carrier family 11 (proton-coupled divalent metal ion transporter), member 1 1 1
SLC15A4 solute carrier family 15 (oligopeptide transporter), member 4 1 1
SPHK2 sphingosine kinase 2 1 1
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
APOBEC4 apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 4 (putative) 0 1
BRWD1 bromodomain and WD repeat domain containing 1 0 1
CCDC101 coiled-coil domain containing 101 0 1
MBTPS2 membrane-bound transcription factor peptidase, site 2 0 1
ORAI3 ORAI calcium release-activated calcium modulator 3 0 1
PARP11 poly (ADP-ribose) polymerase family, member 11 0 1
RAB5A RAB5A, member RAS oncogene family 0 1
RAB5B RAB5B, member RAS oncogene family 0 1
RIOK2 RIO kinase 2 0 1
SLC9A8 solute carrier family 9, subfamily A (NHE8, cation proton antiporter 8), member 8 0 1
SMPD2 sphingomyelin phosphodiesterase 2, neutral membrane (neutral sphingomyelinase) 0 1
TET3 tet methylcytosine dioxygenase 3 0 1
VARS2 valyl-tRNA synthetase 2, mitochondrial 0 1

7.2 Pathway-level comparisons

Recall: disease-centric genetic prioritisation for pathways by Pi are stored respectively in three data frames, that is, AS_pathways for Ankylosing Spondylitis, SA_pathways for Spondyloarthritis and SLE_pathways for Systemic Lupus Erythematosus as exemplars.

Load all pathways into a data frame iPathway:

org.Hs.egMsigdbC2CPall <- xRDataLoader(RData.customised='org.Hs.egMsigdbC2CPall', RData.location=RData.location)
iPathway <- org.Hs.egMsigdbC2CPall$set_info[, c(1,2)]
# Ankylosing Spondylitis (AS)
ind <- match(rownames(iPathway), rownames(AS_pathways))
df_AS <- AS_pathways[ind,]
# Spondyloarthritis (SA)
ind <- match(rownames(iPathway), rownames(SA_pathways))
df_SA <- SA_pathways[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(rownames(iPathway), rownames(SLE_pathways))
df_SLE <- SLE_pathways[ind,]

# Combine into a data frame 'df_adjp'
## adjusted p-values
df_adjp <- cbind(AS=df_AS$adjp, SA=df_SA$adjp, SLE=df_SLE$adjp)
rownames(df_adjp) <- iPathway$name

This barplot displays prioritised pathways for three diseases, in which horizontal lines are used to indicate three groups of pathways prioritised:

  • Top panel: pathways common to all diseases, including IL23-mediated signaling events, IL27-mediated signaling events.
  • Middle panel: pathways common to any two diseases.
  • Bottom panel: pathways unique to individual diseases.

7.3 Network-level comparisons

Recall: the gene network identified by Pi are stored respectively in three data frames, that is, AS_net for Ankylosing Spondylitis, SA_net for Spondyloarthritis and SLE_net for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, V(AS_net)$name)
df_AS <- V(AS_net)$name[ind]
# Spondyloarthritis (SA)
ind <- match(iSymbol, V(SA_net)$name)
df_SA <- V(SA_net)$name[ind]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, V(SLE_net)$name)
df_SLE <- V(SLE_net)$name[ind]

# Combine into a data frame 'df_net'
df_net <- cbind(AS=df_AS, SA=df_SA, SLE=df_SLE)
rownames(df_net) <- iSymbol

Venn diagram of network genes across diseases

data <- list()
data$AS <- iSymbol[!is.na(df_net[,1])]
data$SA <- iSymbol[!is.na(df_net[,2])]
data$SLE <- iSymbol[!is.na(df_net[,3])]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.2)
grid.draw(vp)

The common part of the gene network shared by AS, SA and SLE

# identify common parts of SLE-network and AS-network. 
net_AS_SA_SLE <- graph.intersection(AS_net, SA_net, SLE_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA_SLE
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The 3 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
TYK2 tyrosine kinase 2 4 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by three diseases are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by AS and SA

# identify common parts of AS-network and SA-network. 
net_AS_SA <- graph.intersection(AS_net, SA_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The common 21 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
SCNN1A sodium channel, non voltage gated 1 alpha subunit 1 1
IL12B interleukin 12B 5 0
IL6R interleukin 6 receptor 5 0
ICAM1 intercellular adhesion molecule 1 4 0
TYK2 tyrosine kinase 2 4 0
IL23R interleukin 23 receptor 3 0
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 3 0
NOS2 nitric oxide synthase 2, inducible 3 0
ERAP1 endoplasmic reticulum aminopeptidase 1 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
NPEPPS aminopeptidase puromycin sensitive 2 0
TBX21 T-box 21 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
CAST calpastatin 1 0
IL12RB2 interleukin 12 receptor, beta 2 1 0
LNPEP leucyl/cystinyl aminopeptidase 1 0
CDC37 cell division cycle 37 0 0
MRPL4 mitochondrial ribosomal protein L4 0 0
RUNX3 runt-related transcription factor 3 0 0

The network nodes/genes shared by AS and SA are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by SLE and SA

The common 10 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
TYK2 tyrosine kinase 2 4 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
ETS1 v-ets avian erythroblastosis virus E26 oncogene homolog 1 1 0
SQSTM1 sequestosome 1 1 0
TRIM56 tripartite motif containing 56 1 0
CDC37 cell division cycle 37 0 0
ITGAX integrin, alpha X (complement component 3 receptor 4 subunit) 0 0
RPL34 ribosomal protein L34 0 0

The network nodes/genes shared by SLE and SA are color-coded by annotation additive predictor scores.

The common part of the gene network shared by SLE and AS

The common 4 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
TYK2 tyrosine kinase 2 4 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
SOCS1 suppressor of cytokine signaling 1 1 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by SLE and AS are color-coded by annotation additive predictor scores.

8 References

Below is a list of references that this work stands on:

1000 Genomes Project Consortium. 2012. “An integrated map of genetic variation from 1,092 human genomes.” Nature 135 (V): 0–9. doi:10.1038/nature11632.

Ashburner, M, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, et al. 2000. “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.” Nat Genet 25 (1): 25–29. doi:10.1038/75556.

Cerami, E. G., B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander. 2011. “Pathway Commons, a web resource for biological pathway data.” Nucleic Acids Research 39 (Database): D685–D690. doi:10.1093/nar/gkq1039.

Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (March): 1246949. doi:10.1126/science.1246949.

Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.

Köhler, Sebastian, Sandra C Doelken, Christopher J Mungall, Sebastian Bauer, Helen V Firth, Isabelle Bailleul-Forestier, Graeme C M Black, et al. 2013. “The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.” Nucleic Acids Research, November, 1–9. doi:10.1093/nar/gkt1026.

Schriml, L M, C Arze, S Nadendla, Y W Chang, M Mazaitis, V Felix, G Feng, and W A Kibbe. 2012. “Disease Ontology: a backbone for disease semantic integration.” Nucleic Acids Res 40 (Database issue): D940–6. doi:10.1093/nar/gkr972.

Smith, C L, and J T Eppig. 2009. “The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis.” Wiley Interdiscip Rev Syst Biol Med 1 (3): 390–99. doi:10.1002/wsbm.44.

Szklarczyk, Damian, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-cepas, Milan Simonovic, et al. 2015. “STRING v10 : protein – protein interaction networks , integrated over the tree of life.” Nucleic Acids Res 43 (Database): D447–D452. doi:10.1093/nar/gku1003.

Welter, Danielle, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, et al. 2014. “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.” Nucleic Acids Research 42 (D1): 1001–6. doi:10.1093/nar/gkt1229.


  1. The wording a2maid comes from the mermaid (describing the maid of the sea having a fish’s tail), and instead of the maid of the sea, replacing with a2 turns to be the maid of our lab having two PhD students Anna Sanniti and Alicia Lledo Lara.